{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Azure AI Search integrated vectorization sample\n",
    "\n",
    "This Python notebook demonstrates the [integrated vectorization](https://learn.microsoft.com/azure/search/vector-search-integrated-vectorization) features of Azure AI Search that are currently in public preview. \n",
    "\n",
    "Integrated vectorization takes a dependency on indexers and skillsets, using the Text Split skill for data chunking, and the AzureOpenAIEmbedding skill and your Azure OpenAI resorce for embedding.\n",
    "\n",
    "This example uses PDFs from the `data/documents` folder for chunking, embedding, indexing, and queries.\n",
    "\n",
    "### Prerequisites\n",
    "\n",
    "+ An Azure subscription, with [access to Azure OpenAI](https://aka.ms/oai/access).\n",
    " \n",
    "+ Azure AI Search, any tier, but we recommend Basic or higher for this workload. [Enable semantic ranker](https://learn.microsoft.com/azure/search/semantic-how-to-enable-disable) if you want to run a hybrid query with semantic ranking.\n",
    "\n",
    "+ A deployment of the `text-embedding-ada-002` model on Azure OpenAI.\n",
    "\n",
    "+ Azure Blob Storage. This notebook connects to your storage account and loads a container with the sample PDFs.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up a Python virtual environment in Visual Studio Code\n",
    "\n",
    "1. Open the Command Palette (Ctrl+Shift+P).\n",
    "1. Search for **Python: Create Environment**.\n",
    "1. Select **Venv**.\n",
    "1. Select a Python interpreter. Choose 3.10 or later.\n",
    "\n",
    "It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install -r azure-search-integrated-vectorization-sample-requirements.txt --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load .env file (Copy .env-sample to .env and update accordingly)\n",
    "\n",
    "Optionally, you can test the following features of integrated vectorization using this notebook by setting the appropriate environment variables below:\n",
    "\n",
    "1. [OCR](https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-ocr) every page using the built-in OCR functionality. This allows you to add page numbers for every chunk that is extracted. It requires an [AI Services account](https://learn.microsoft.com/en-us/azure/search/cognitive-search-attach-cognitive-services)\n",
    "   1. Set `USE_OCR` to true and specify `AZURE_AI_SERVICES_KEY` if using key-based authentication, and specify `AZURE_AI_SERVICES_ENDPOINT`.\n",
    "1. Use the [Document Layout Skill](https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-document-intelligence-layout) to convert PDFs and other compatible documents to markdown. It requires an [AI Services account](https://learn.microsoft.com/en-us/azure/search/cognitive-search-attach-cognitive-services) and a search service in a [supported region](https://learn.microsoft.com/en-us/azure/search/cognitive-search-attach-cognitive-services)\n",
    "   1. Set `USE_LAYOUT` to true and specify `AZURE_AI_SERVICES_KEY` if using key-based authentication, and specify `AZURE_AI_SERVICES_ENDPOINT`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "from azure.identity import DefaultAzureCredential\n",
    "from azure.core.credentials import AzureKeyCredential\n",
    "import os\n",
    "\n",
    "load_dotenv(override=True) # take environment variables from .env.\n",
    "\n",
    "# Variables not used here do not need to be updated in your .env file\n",
    "endpoint = os.environ[\"AZURE_SEARCH_SERVICE_ENDPOINT\"]\n",
    "credential = AzureKeyCredential(os.getenv(\"AZURE_SEARCH_ADMIN_KEY\")) if os.getenv(\"AZURE_SEARCH_ADMIN_KEY\") else DefaultAzureCredential()\n",
    "index_name = os.getenv(\"AZURE_SEARCH_INDEX\", \"int-vec\")\n",
    "blob_connection_string = os.environ[\"BLOB_CONNECTION_STRING\"]\n",
    "# search blob datasource connection string is optional - defaults to blob connection string\n",
    "# This field is only necessary if you are using MI to connect to the data source\n",
    "# https://learn.microsoft.com/azure/search/search-howto-indexing-azure-blob-storage#supported-credentials-and-connection-strings\n",
    "search_blob_connection_string = os.getenv(\"SEARCH_BLOB_DATASOURCE_CONNECTION_STRING\", blob_connection_string)\n",
    "blob_container_name = os.getenv(\"BLOB_CONTAINER_NAME\", \"int-vec\")\n",
    "azure_openai_endpoint = os.environ[\"AZURE_OPENAI_ENDPOINT\"]\n",
    "azure_openai_key = os.getenv(\"AZURE_OPENAI_KEY\")\n",
    "azure_openai_embedding_deployment = os.getenv(\"AZURE_OPENAI_EMBEDDING_DEPLOYMENT\", \"text-embedding-3-large\")\n",
    "azure_openai_model_name = os.getenv(\"AZURE_OPENAI_EMBEDDING_MODEL_NAME\", \"text-embedding-3-large\")\n",
    "azure_openai_model_dimensions = int(os.getenv(\"AZURE_OPENAI_EMBEDDING_DIMENSIONS\", 1024))\n",
    "# This field is only necessary if you want to use OCR to scan PDFs in the datasource or use the Document Layout skill without a key\n",
    "azure_ai_services_endpoint = os.getenv(\"AZURE_AI_SERVICES_ENDPOINT\", \"\")\n",
    "# This field is only necessary if you want to use OCR to scan PDFs in the data source or use the Document Layout skill and you want to authenticate using a key to Azure AI Services\n",
    "azure_ai_services_key = os.getenv(\"AZURE_AI_SERVICES_KEY\", \"\")\n",
    "\n",
    "# set USE_OCR to enable OCR to add page numbers. It cannot be combined with the document layout skill\n",
    "use_ocr = os.getenv(\"USE_OCR\", \"false\") == \"true\"\n",
    "# set USE_LAYOUT to enable Document Intelligence Layout skill for chunking by markdown. It cannot be combined with the built-in OCR\n",
    "use_document_layout = os.getenv(\"USE_LAYOUT\", \"false\") == \"true\"\n",
    "# set USE_MARKDOWN to enable parsing markdown files in the blob container. It cannot be combined with the built-in OCR or document layout skill\n",
    "use_markdown = os.getenv(\"USE_MARKDOWN\", \"false\") == \"true\"\n",
    "# Deepest nesting level in markdown that should be considered. See https://learn.microsoft.com/azure/search/cognitive-search-skill-document-intelligence-layout to learn more\n",
    "document_layout_depth = os.getenv(\"LAYOUT_MARKDOWN_HEADER_DEPTH\", \"h3\")\n",
    "# OCR must be used to add page numbers\n",
    "add_page_numbers = use_ocr\n",
    "\n",
    "count_enabled = sum([use_ocr, use_document_layout, use_markdown])\n",
    "if count_enabled >= 2:\n",
    "    raise Exception(f\"Please enable only one of OCR, Layout or Markdown.\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Connect to Blob Storage and load documents\n",
    "\n",
    "Retrieve documents from Blob Storage. You can use the sample documents in the data/documents folder.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Setup sample data in int-vec-markdown\n"
     ]
    }
   ],
   "source": [
    "from azure.storage.blob import BlobServiceClient  \n",
    "import glob\n",
    "\n",
    "sample_docs_directory = os.path.join(\"..\", \"..\", \"..\", \"data\", \"benefitdocs\")\n",
    "sample_ocr_docs_directory = os.path.join(\"..\", \"..\", \"..\", \"data\", \"ocrdocuments\")\n",
    "sample_layout_docs_directory = os.path.join(\"..\", \"..\", \"..\", \"data\", \"layoutdocuments\")\n",
    "sample_markdown_docs_directory = os.path.join(\"..\", \"..\", \"..\", \"data\", \"markdowndocuments\")\n",
    "\n",
    "def upload_sample_documents(\n",
    "        blob_connection_string: str,\n",
    "        blob_container_name: str,\n",
    "        documents_directory: str,\n",
    "        # Set to false if you want to use credentials included in the blob connection string\n",
    "        # Otherwise your identity will be used as credentials\n",
    "        use_user_identity: bool = True\n",
    "    ):\n",
    "        # Connect to Blob Storage\n",
    "        blob_service_client = BlobServiceClient.from_connection_string(logging_enable=True, conn_str=blob_connection_string, credential=DefaultAzureCredential() if use_user_identity else None)\n",
    "        container_client = blob_service_client.get_container_client(blob_container_name)\n",
    "        if not container_client.exists():\n",
    "            container_client.create_container()\n",
    "\n",
    "        files = glob.glob(os.path.join(documents_directory, '*'))\n",
    "        for file in files:\n",
    "            with open(file, \"rb\") as data:\n",
    "                name = os.path.basename(file)\n",
    "                if not container_client.get_blob_client(name).exists():\n",
    "                    container_client.upload_blob(name=name, data=data)\n",
    "\n",
    "docs_directory = sample_docs_directory\n",
    "\n",
    "if use_ocr:\n",
    "    docs_directory = sample_ocr_docs_directory\n",
    "elif use_document_layout:\n",
    "    docs_directory = sample_layout_docs_directory\n",
    "elif use_markdown:\n",
    "    docs_directory = sample_markdown_docs_directory\n",
    "\n",
    "upload_sample_documents(\n",
    "    blob_connection_string=blob_connection_string,\n",
    "    blob_container_name=blob_container_name,\n",
    "    documents_directory = docs_directory)\n",
    "\n",
    "print(f\"Setup sample data in {blob_container_name}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a blob data source connector on Azure AI Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data source 'int-vec-markdown-blob' created or updated\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes import SearchIndexerClient\n",
    "from azure.search.documents.indexes.models import (\n",
    "    SearchIndexerDataContainer,\n",
    "    SearchIndexerDataSourceConnection\n",
    ")\n",
    "from azure.search.documents.indexes.models import NativeBlobSoftDeleteDeletionDetectionPolicy\n",
    "\n",
    "# Create a data source \n",
    "indexer_client = SearchIndexerClient(endpoint, credential)\n",
    "container = SearchIndexerDataContainer(name=blob_container_name)\n",
    "data_source_connection = SearchIndexerDataSourceConnection(\n",
    "    name=f\"{index_name}-blob\",\n",
    "    type=\"azureblob\",\n",
    "    connection_string=search_blob_connection_string,\n",
    "    container=container,\n",
    "    data_deletion_detection_policy=NativeBlobSoftDeleteDeletionDetectionPolicy() if not use_markdown else None\n",
    ")\n",
    "data_source = indexer_client.create_or_update_data_source_connection(data_source_connection)\n",
    "\n",
    "print(f\"Data source '{data_source.name}' created or updated\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a search index\n",
    "\n",
    "Vector and nonvector content is stored in a search index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "int-vec-markdown created\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes import SearchIndexClient\n",
    "from azure.search.documents.indexes.models import (\n",
    "    SearchField,\n",
    "    SearchFieldDataType,\n",
    "    VectorSearch,\n",
    "    HnswAlgorithmConfiguration,\n",
    "    VectorSearchProfile,\n",
    "    AzureOpenAIVectorizer,\n",
    "    AzureOpenAIVectorizerParameters,\n",
    "    SemanticConfiguration,\n",
    "    SemanticSearch,\n",
    "    SemanticPrioritizedFields,\n",
    "    SemanticField,\n",
    "    SearchIndex\n",
    ")\n",
    "\n",
    "# Create a search index  \n",
    "index_client = SearchIndexClient(endpoint=endpoint, credential=credential)  \n",
    "fields = [  \n",
    "    SearchField(name=\"parent_id\", type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=True),  \n",
    "    SearchField(name=\"title\", type=SearchFieldDataType.String),  \n",
    "    SearchField(name=\"chunk_id\", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True, analyzer_name=\"keyword\"),  \n",
    "    SearchField(name=\"chunk\", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),  \n",
    "    SearchField(name=\"vector\", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=azure_openai_model_dimensions, vector_search_profile_name=\"myHnswProfile\"),  \n",
    "]\n",
    "\n",
    "if add_page_numbers:\n",
    "    fields.append(\n",
    "        SearchField(name=\"page_number\", type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=False)\n",
    "    )\n",
    "\n",
    "if use_document_layout or use_markdown:\n",
    "    fields.extend([\n",
    "        SearchField(name=\"header_1\", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),\n",
    "        SearchField(name=\"header_2\", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),\n",
    "        SearchField(name=\"header_3\", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False)\n",
    "    ])\n",
    "  \n",
    "# Configure the vector search configuration  \n",
    "vector_search = VectorSearch(  \n",
    "    algorithms=[  \n",
    "        HnswAlgorithmConfiguration(name=\"myHnsw\"),\n",
    "    ],  \n",
    "    profiles=[  \n",
    "        VectorSearchProfile(  \n",
    "            name=\"myHnswProfile\",  \n",
    "            algorithm_configuration_name=\"myHnsw\",  \n",
    "            vectorizer_name=\"myOpenAI\",  \n",
    "        )\n",
    "    ],  \n",
    "    vectorizers=[  \n",
    "        AzureOpenAIVectorizer(  \n",
    "            vectorizer_name=\"myOpenAI\",  \n",
    "            kind=\"azureOpenAI\",  \n",
    "            parameters=AzureOpenAIVectorizerParameters(  \n",
    "                resource_url=azure_openai_endpoint,  \n",
    "                deployment_name=azure_openai_embedding_deployment,\n",
    "                model_name=azure_openai_model_name,\n",
    "                api_key=azure_openai_key,\n",
    "            ),\n",
    "        ),  \n",
    "    ],  \n",
    ")  \n",
    "  \n",
    "semantic_config = SemanticConfiguration(  \n",
    "    name=\"my-semantic-config\",  \n",
    "    prioritized_fields=SemanticPrioritizedFields(  \n",
    "        content_fields=[SemanticField(field_name=\"chunk\")],\n",
    "        title_field=SemanticField(field_name=\"title\")\n",
    "    ),  \n",
    ")\n",
    "  \n",
    "# Create the semantic search with the configuration  \n",
    "semantic_search = SemanticSearch(configurations=[semantic_config])  \n",
    "  \n",
    "# Create the search index\n",
    "index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search)  \n",
    "result = index_client.create_or_update_index(index)  \n",
    "print(f\"{result.name} created\")  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a skillset\n",
    "\n",
    "Skills drive integrated vectorization. [Text Split](https://learn.microsoft.com/azure/search/cognitive-search-skill-textsplit) provides data chunking. [AzureOpenAIEmbedding](https://learn.microsoft.com/azure/search/cognitive-search-skill-azure-openai-embedding) handles calls to Azure OpenAI, using the connection information you provide in the environment variables. An [indexer projection](https://learn.microsoft.com/azure/search/index-projections-concept-intro) specifies secondary indexes used for chunked data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "int-vec-markdown-skillset created\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes.models import (\n",
    "    SplitSkill,\n",
    "    InputFieldMappingEntry,\n",
    "    OutputFieldMappingEntry,\n",
    "    AzureOpenAIEmbeddingSkill,\n",
    "    OcrSkill,\n",
    "    SearchIndexerIndexProjection,\n",
    "    SearchIndexerIndexProjectionSelector,\n",
    "    SearchIndexerIndexProjectionsParameters,\n",
    "    IndexProjectionMode,\n",
    "    SearchIndexerSkillset,\n",
    "    AIServicesAccountKey,\n",
    "    AIServicesAccountIdentity,\n",
    "    DocumentIntelligenceLayoutSkill\n",
    ")\n",
    "\n",
    "# Create a skillset name \n",
    "skillset_name = f\"{index_name}-skillset\"\n",
    "\n",
    "def create_ocr_skillset():\n",
    "    ocr_skill = OcrSkill(\n",
    "        description=\"OCR skill to scan PDFs and other images with text\",\n",
    "        context=\"/document/normalized_images/*\",\n",
    "        line_ending=\"Space\",\n",
    "        default_language_code=\"en\",\n",
    "        should_detect_orientation=True,\n",
    "        inputs=[\n",
    "            InputFieldMappingEntry(name=\"image\", source=\"/document/normalized_images/*\")\n",
    "        ],\n",
    "        outputs=[\n",
    "            OutputFieldMappingEntry(name=\"text\", target_name=\"text\"),\n",
    "            OutputFieldMappingEntry(name=\"layoutText\", target_name=\"layoutText\")\n",
    "        ]\n",
    "    )\n",
    "\n",
    "    split_skill = SplitSkill(  \n",
    "        description=\"Split skill to chunk documents\",  \n",
    "        text_split_mode=\"pages\",  \n",
    "        context=\"/document/normalized_images/*\",  \n",
    "        maximum_page_length=2000,  \n",
    "        page_overlap_length=500,  \n",
    "        inputs=[  \n",
    "            InputFieldMappingEntry(name=\"text\", source=\"/document/normalized_images/*/text\"),  \n",
    "        ],  \n",
    "        outputs=[  \n",
    "            OutputFieldMappingEntry(name=\"textItems\", target_name=\"pages\")  \n",
    "        ]\n",
    "    )\n",
    "\n",
    "    embedding_skill = AzureOpenAIEmbeddingSkill(  \n",
    "        description=\"Skill to generate embeddings via Azure OpenAI\",  \n",
    "        context=\"/document/normalized_images/*/pages/*\",  \n",
    "        resource_url=azure_openai_endpoint,  \n",
    "        deployment_name=azure_openai_embedding_deployment,  \n",
    "        model_name=azure_openai_model_name,\n",
    "        dimensions=azure_openai_model_dimensions,\n",
    "        api_key=azure_openai_key,  \n",
    "        inputs=[  \n",
    "            InputFieldMappingEntry(name=\"text\", source=\"/document/normalized_images/*/pages/*\"),  \n",
    "        ],  \n",
    "        outputs=[\n",
    "            OutputFieldMappingEntry(name=\"embedding\", target_name=\"vector\")  \n",
    "        ]\n",
    "    )\n",
    "\n",
    "    index_projections = SearchIndexerIndexProjection(  \n",
    "        selectors=[  \n",
    "            SearchIndexerIndexProjectionSelector(  \n",
    "                target_index_name=index_name,  \n",
    "                parent_key_field_name=\"parent_id\",  \n",
    "                source_context=\"/document/normalized_images/*/pages/*\",  \n",
    "                mappings=[\n",
    "                    InputFieldMappingEntry(name=\"chunk\", source=\"/document/normalized_images/*/pages/*\"),  \n",
    "                    InputFieldMappingEntry(name=\"vector\", source=\"/document/normalized_images/*/pages/*/vector\"),\n",
    "                    InputFieldMappingEntry(name=\"title\", source=\"/document/metadata_storage_name\"),\n",
    "                    InputFieldMappingEntry(name=\"page_number\", source=\"/document/normalized_images/*/pageNumber\")\n",
    "                ]\n",
    "            )\n",
    "        ],  \n",
    "        parameters=SearchIndexerIndexProjectionsParameters(  \n",
    "            projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS  \n",
    "        )  \n",
    "    )\n",
    "\n",
    "    skills = [ocr_skill, split_skill, embedding_skill]\n",
    "\n",
    "    return SearchIndexerSkillset(  \n",
    "        name=skillset_name,  \n",
    "        description=\"Skillset to chunk documents and generating embeddings\",  \n",
    "        skills=skills,  \n",
    "        index_projection=index_projections,\n",
    "        cognitive_services_account=AIServicesAccountKey(key=azure_ai_services_key, subdomain_url=azure_ai_services_endpoint) if azure_ai_services_key else AIServicesAccountIdentity(identity=None, subdomain_url=azure_ai_services_endpoint)\n",
    "    )\n",
    "\n",
    "def create_layout_skillset():\n",
    "    layout_skill = DocumentIntelligenceLayoutSkill(\n",
    "        description=\"Layout skill to read documents\",\n",
    "        context=\"/document\",\n",
    "        output_mode=\"oneToMany\",\n",
    "        markdown_header_depth=\"h3\",\n",
    "        inputs=[\n",
    "            InputFieldMappingEntry(name=\"file_data\", source=\"/document/file_data\")\n",
    "        ],\n",
    "        outputs=[\n",
    "            OutputFieldMappingEntry(name=\"markdown_document\", target_name=\"markdownDocument\")\n",
    "        ]\n",
    "    )\n",
    "\n",
    "    split_skill = SplitSkill(  \n",
    "        description=\"Split skill to chunk documents\",  \n",
    "        text_split_mode=\"pages\",  \n",
    "        context=\"/document/markdownDocument/*\",  \n",
    "        maximum_page_length=2000,  \n",
    "        page_overlap_length=500,  \n",
    "        inputs=[  \n",
    "            InputFieldMappingEntry(name=\"text\", source=\"/document/markdownDocument/*/content\"),  \n",
    "        ],  \n",
    "        outputs=[  \n",
    "            OutputFieldMappingEntry(name=\"textItems\", target_name=\"pages\")  \n",
    "        ]\n",
    "    )\n",
    "\n",
    "    embedding_skill = AzureOpenAIEmbeddingSkill(  \n",
    "        description=\"Skill to generate embeddings via Azure OpenAI\",  \n",
    "        context=\"/document/markdownDocument/*/pages/*\",  \n",
    "        resource_url=azure_openai_endpoint,  \n",
    "        deployment_name=azure_openai_embedding_deployment, \n",
    "        model_name=azure_openai_model_name,\n",
    "        dimensions=azure_openai_model_dimensions,\n",
    "        api_key=azure_openai_key,\n",
    "        inputs=[  \n",
    "            InputFieldMappingEntry(name=\"text\", source=\"/document/markdownDocument/*/pages/*\"),  \n",
    "        ],  \n",
    "        outputs=[\n",
    "            OutputFieldMappingEntry(name=\"embedding\", target_name=\"vector\")  \n",
    "        ]\n",
    "    )\n",
    "\n",
    "    index_projections = SearchIndexerIndexProjection(  \n",
    "        selectors=[  \n",
    "            SearchIndexerIndexProjectionSelector(  \n",
    "                target_index_name=index_name,  \n",
    "                parent_key_field_name=\"parent_id\",  \n",
    "                source_context=\"/document/markdownDocument/*/pages/*\",  \n",
    "                mappings=[\n",
    "                    InputFieldMappingEntry(name=\"chunk\", source=\"/document/markdownDocument/*/pages/*\"),  \n",
    "                    InputFieldMappingEntry(name=\"vector\", source=\"/document/markdownDocument/*/pages/*/vector\"),\n",
    "                    InputFieldMappingEntry(name=\"title\", source=\"/document/metadata_storage_name\"),\n",
    "                    InputFieldMappingEntry(name=\"header_1\", source=\"/document/markdownDocument/*/sections/h1\"),\n",
    "                    InputFieldMappingEntry(name=\"header_2\", source=\"/document/markdownDocument/*/sections/h2\"),\n",
    "                    InputFieldMappingEntry(name=\"header_3\", source=\"/document/markdownDocument/*/sections/h3\"),\n",
    "                ]\n",
    "            )\n",
    "        ],  \n",
    "        parameters=SearchIndexerIndexProjectionsParameters(  \n",
    "            projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS  \n",
    "        )  \n",
    "    )\n",
    "\n",
    "    skills = [layout_skill, split_skill, embedding_skill]\n",
    "\n",
    "    return SearchIndexerSkillset(  \n",
    "        name=skillset_name,  \n",
    "        description=\"Skillset to chunk documents and generating embeddings\",  \n",
    "        skills=skills,  \n",
    "        index_projection=index_projections,\n",
    "        cognitive_services_account=AIServicesAccountKey(key=azure_ai_services_key, subdomain_url=azure_ai_services_endpoint) if azure_ai_services_key else AIServicesAccountIdentity(identity=None, subdomain_url=azure_ai_services_endpoint)\n",
    "    )\n",
    "\n",
    "def create_markdown_skillset():\n",
    "    split_skill = SplitSkill(  \n",
    "        description=\"Split skill to chunk documents\",  \n",
    "        text_split_mode=\"pages\",  \n",
    "        context=\"/document\",  \n",
    "        maximum_page_length=2000,  \n",
    "        page_overlap_length=500,  \n",
    "        inputs=[  \n",
    "            InputFieldMappingEntry(name=\"text\", source=\"/document/content\"),  \n",
    "        ],  \n",
    "        outputs=[  \n",
    "            OutputFieldMappingEntry(name=\"textItems\", target_name=\"pages\")  \n",
    "        ]\n",
    "    )\n",
    "\n",
    "    embedding_skill = AzureOpenAIEmbeddingSkill(  \n",
    "        description=\"Skill to generate embeddings via Azure OpenAI\",  \n",
    "        context=\"/document/pages/*\",  \n",
    "        resource_url=azure_openai_endpoint,  \n",
    "        deployment_name=azure_openai_embedding_deployment,\n",
    "        model_name=azure_openai_model_name,\n",
    "        dimensions=azure_openai_model_dimensions,\n",
    "        api_key=azure_openai_key,  \n",
    "        inputs=[  \n",
    "            InputFieldMappingEntry(name=\"text\", source=\"/document/pages/*\"),  \n",
    "        ],  \n",
    "        outputs=[\n",
    "            OutputFieldMappingEntry(name=\"embedding\", target_name=\"vector\")  \n",
    "        ]\n",
    "    )\n",
    "\n",
    "    index_projections = SearchIndexerIndexProjection(  \n",
    "        selectors=[  \n",
    "            SearchIndexerIndexProjectionSelector(  \n",
    "                target_index_name=index_name,  \n",
    "                parent_key_field_name=\"parent_id\",  \n",
    "                source_context=\"/document/pages/*\",  \n",
    "                mappings=[\n",
    "                    InputFieldMappingEntry(name=\"chunk\", source=\"/document/pages/*\"),  \n",
    "                    InputFieldMappingEntry(name=\"vector\", source=\"/document/pages/*/vector\"),\n",
    "                    InputFieldMappingEntry(name=\"title\", source=\"/document/metadata_storage_name\"),\n",
    "                    InputFieldMappingEntry(name=\"header_1\", source=\"/document/sections/h1\"),\n",
    "                    InputFieldMappingEntry(name=\"header_2\", source=\"/document/sections/h2\"),\n",
    "                    InputFieldMappingEntry(name=\"header_3\", source=\"/document/sections/h3\"),\n",
    "                ]\n",
    "            )\n",
    "        ],  \n",
    "        parameters=SearchIndexerIndexProjectionsParameters(  \n",
    "            projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS  \n",
    "        )  \n",
    "    )\n",
    "\n",
    "    skills = [split_skill, embedding_skill]\n",
    "\n",
    "    return SearchIndexerSkillset(  \n",
    "        name=skillset_name,  \n",
    "        description=\"Skillset to chunk documents and generating embeddings\",  \n",
    "        skills=skills,  \n",
    "        index_projection=index_projections,\n",
    "        cognitive_services_account=AIServicesAccountKey(key=azure_ai_services_key, subdomain_url=azure_ai_services_endpoint) if azure_ai_services_key else AIServicesAccountIdentity(identity=None, subdomain_url=azure_ai_services_endpoint)\n",
    "    )\n",
    "\n",
    "def create_skillset():\n",
    "    split_skill = SplitSkill(  \n",
    "        description=\"Split skill to chunk documents\",  \n",
    "        text_split_mode=\"pages\",  \n",
    "        context=\"/document\",  \n",
    "        maximum_page_length=2000,  \n",
    "        page_overlap_length=500,  \n",
    "        inputs=[  \n",
    "            InputFieldMappingEntry(name=\"text\", source=\"/document/content\"),  \n",
    "        ],  \n",
    "        outputs=[  \n",
    "            OutputFieldMappingEntry(name=\"textItems\", target_name=\"pages\")  \n",
    "        ]\n",
    "    )\n",
    "\n",
    "    embedding_skill = AzureOpenAIEmbeddingSkill(  \n",
    "        description=\"Skill to generate embeddings via Azure OpenAI\",  \n",
    "        context=\"/document/pages/*\",  \n",
    "        resource_url=azure_openai_endpoint,  \n",
    "        deployment_name=azure_openai_embedding_deployment,  \n",
    "        model_name=azure_openai_model_name,\n",
    "        dimensions=azure_openai_model_dimensions,\n",
    "        api_key=azure_openai_key,  \n",
    "        inputs=[  \n",
    "            InputFieldMappingEntry(name=\"text\", source=\"/document/pages/*\"),  \n",
    "        ],  \n",
    "        outputs=[\n",
    "            OutputFieldMappingEntry(name=\"embedding\", target_name=\"vector\")  \n",
    "        ]\n",
    "    )\n",
    "\n",
    "    index_projections = SearchIndexerIndexProjection(  \n",
    "        selectors=[  \n",
    "            SearchIndexerIndexProjectionSelector(  \n",
    "                target_index_name=index_name,  \n",
    "                parent_key_field_name=\"parent_id\",  \n",
    "                source_context=\"/document/pages/*\",  \n",
    "                mappings=[\n",
    "                    InputFieldMappingEntry(name=\"chunk\", source=\"/document/pages/*\"),  \n",
    "                    InputFieldMappingEntry(name=\"vector\", source=\"/document/pages/*/vector\"),\n",
    "                    InputFieldMappingEntry(name=\"title\", source=\"/document/metadata_storage_name\")\n",
    "                ]\n",
    "            )\n",
    "        ],  \n",
    "        parameters=SearchIndexerIndexProjectionsParameters(  \n",
    "            projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS  \n",
    "        )  \n",
    "    )\n",
    "\n",
    "    skills = [split_skill, embedding_skill]\n",
    "\n",
    "    return SearchIndexerSkillset(  \n",
    "        name=skillset_name,  \n",
    "        description=\"Skillset to chunk documents and generating embeddings\",  \n",
    "        skills=skills,  \n",
    "        index_projection=index_projections\n",
    "    )\n",
    "\n",
    "skillset = create_ocr_skillset() if use_ocr else create_layout_skillset() if use_document_layout else create_markdown_skillset() if use_markdown else create_skillset()\n",
    "  \n",
    "client = SearchIndexerClient(endpoint, credential)  \n",
    "client.create_or_update_skillset(skillset)  \n",
    "print(f\"{skillset.name} created\")  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create an indexer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " int-vec-markdown-indexer is created and running. If queries return no results, please wait a bit and try again.\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes.models import (\n",
    "    SearchIndexer,\n",
    "    IndexingParameters,\n",
    "    IndexingParametersConfiguration,\n",
    "    BlobIndexerImageAction,\n",
    ")\n",
    "\n",
    "# Create an indexer  \n",
    "indexer_name = f\"{index_name}-indexer\"  \n",
    "\n",
    "indexer_parameters = None\n",
    "if use_ocr:\n",
    "    indexer_parameters = IndexingParameters(\n",
    "        configuration=IndexingParametersConfiguration(\n",
    "            image_action=BlobIndexerImageAction.GENERATE_NORMALIZED_IMAGE_PER_PAGE,\n",
    "            query_timeout=None))\n",
    "elif use_document_layout:\n",
    "    indexer_parameters = IndexingParameters(\n",
    "        configuration=IndexingParametersConfiguration(\n",
    "            allow_skillset_to_read_file_data=True,\n",
    "            query_timeout=None))\n",
    "elif use_markdown:\n",
    "    indexer_parameters = IndexingParameters(\n",
    "        configuration=IndexingParametersConfiguration(\n",
    "            parsing_mode=\"markdown\",\n",
    "            markdown_parsing_submode=\"oneToMany\",\n",
    "            markdown_header_depth=document_layout_depth,\n",
    "            query_timeout=None))\n",
    "\n",
    "indexer = SearchIndexer(  \n",
    "    name=indexer_name,  \n",
    "    description=\"Indexer to index documents and generate embeddings\",  \n",
    "    skillset_name=skillset_name,  \n",
    "    target_index_name=index_name,  \n",
    "    data_source_name=data_source.name,\n",
    "    parameters=indexer_parameters\n",
    ")  \n",
    "\n",
    "indexer_client = SearchIndexerClient(endpoint, credential)  \n",
    "indexer_result = indexer_client.create_or_update_indexer(indexer)  \n",
    "  \n",
    "# Run the indexer  \n",
    "indexer_client.run_indexer(indexer_name)  \n",
    "print(f' {indexer_name} is created and running. If queries return no results, please wait a bit and try again.')  \n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform a vector similarity search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example shows a pure vector search using the vectorizable text query, all you need to do is pass in text and your vectorizer will handle the query vectorization.\n",
    "\n",
    "If you indexed the health plan PDF file, send queries that ask plan-related questions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "parent_id: aHR0cHM6Ly9tYWdvdHRlaWFkbHNnZW4yLmJsb2IuY29yZS53aW5kb3dzLm5ldC9pbnQtdmVjLW1hcmtkb3duL0JlbmVmaXRfT3B0aW9ucy5tZDs00\n",
      "chunk_id: d9167148bb62_aHR0cHM6Ly9tYWdvdHRlaWFkbHNnZW4yLmJsb2IuY29yZS53aW5kb3dzLm5ldC9pbnQtdmVjLW1hcmtkb3duL0JlbmVmaXRfT3B0aW9ucy5tZDs00_pages_0\n",
      "Score: 0.8225844\n",
      "Content: Both plans offer coverage for routine physicals, well-child visits, immunizations, and other preventive\n",
      "care services. The plans also cover preventive care services such as mammograms, colonoscopies, and\n",
      "other cancer screenings.\n",
      "\n",
      "Northwind Health Plus offers more comprehensive coverage than Northwind Standard. This plan offers\n",
      "coverage for emergency services, both in-network and out-of-network, as well as mental health and\n",
      "substance abuse coverage. Northwind Standard does not offer coverage for emergency services, mental\n",
      "health and substance abuse coverage, or out-of-network services.\n",
      "\n",
      "Both plans offer coverage for prescription drugs. Northwind Health Plus offers a wider range of\n",
      "prescription drug coverage than Northwind Standard. Northwind Health Plus covers generic, brand-\n",
      "name, and specialty drugs, while Northwind Standard only covers generic and brand-name drugs.\n",
      "\n",
      "Both plans offer coverage for vision and dental services. Northwind Health Plus offers coverage for vision\n",
      "exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings. Northwind Standard\n",
      "only offers coverage for vision exams and glasses.\n",
      "\n",
      "Both plans offer coverage for medical services. Northwind Health Plus offers coverage for hospital stays,\n",
      "doctor visits, lab tests, and X-rays. Northwind Standard only offers coverage for doctor visits and lab\n",
      "tests.\n",
      "\n",
      "Northwind Health Plus is a comprehensive plan that offers more coverage than Northwind Standard.\n",
      "Northwind Health Plus offers coverage for emergency services, mental health and substance abuse\n",
      "coverage, and out-of-network services, while Northwind Standard does not. Northwind Health Plus also\n",
      "\n",
      "<!-- PageBreak -->\n",
      "\n",
      "offers a wider range of prescription drug coverage than Northwind Standard. Both plans offer coverage\n",
      "for vision and dental services, as well as medical services.\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents import SearchClient\n",
    "from azure.search.documents.models import VectorizableTextQuery\n",
    "\n",
    "# Pure Vector Search\n",
    "query = \"Which is more comprehensive, Northwind Health Plus vs Northwind Standard?\"\n",
    "if use_ocr:\n",
    "    query = \"Who is the national director?\"\n",
    "if use_document_layout:\n",
    "    query = \"What is contoso?\"\n",
    "  \n",
    "search_client = SearchClient(endpoint, index_name, credential=credential)\n",
    "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=1, fields=\"vector\", exhaustive=True)\n",
    "# Use the below query to pass in the raw vector query instead of the query vectorization\n",
    "# vector_query = RawVectorQuery(vector=generate_embeddings(query), k_nearest_neighbors=3, fields=\"vector\")\n",
    "  \n",
    "results = search_client.search(  \n",
    "    search_text=None,  \n",
    "    vector_queries= [vector_query],\n",
    "    top=1\n",
    ")  \n",
    "  \n",
    "for result in results:  \n",
    "    print(f\"parent_id: {result['parent_id']}\")  \n",
    "    print(f\"chunk_id: {result['chunk_id']}\")  \n",
    "    if add_page_numbers:\n",
    "        print(f\"page_number: {result['page_number']}\")\n",
    "    print(f\"Score: {result['@search.score']}\")  \n",
    "    print(f\"Content: {result['chunk']}\")   \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform a hybrid search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "parent_id: aHR0cHM6Ly9tYWdvdHRlaWFkbHNnZW4yLmJsb2IuY29yZS53aW5kb3dzLm5ldC9pbnQtdmVjLW1hcmtkb3duL0JlbmVmaXRfT3B0aW9ucy5tZDs00\n",
      "chunk_id: d9167148bb62_aHR0cHM6Ly9tYWdvdHRlaWFkbHNnZW4yLmJsb2IuY29yZS53aW5kb3dzLm5ldC9pbnQtdmVjLW1hcmtkb3duL0JlbmVmaXRfT3B0aW9ucy5tZDs00_pages_0\n",
      "Score: 0.03159204125404358\n",
      "Content: Both plans offer coverage for routine physicals, well-child visits, immunizations, and other preventive\n",
      "care services. The plans also cover preventive care services such as mammograms, colonoscopies, and\n",
      "other cancer screenings.\n",
      "\n",
      "Northwind Health Plus offers more comprehensive coverage than Northwind Standard. This plan offers\n",
      "coverage for emergency services, both in-network and out-of-network, as well as mental health and\n",
      "substance abuse coverage. Northwind Standard does not offer coverage for emergency services, mental\n",
      "health and substance abuse coverage, or out-of-network services.\n",
      "\n",
      "Both plans offer coverage for prescription drugs. Northwind Health Plus offers a wider range of\n",
      "prescription drug coverage than Northwind Standard. Northwind Health Plus covers generic, brand-\n",
      "name, and specialty drugs, while Northwind Standard only covers generic and brand-name drugs.\n",
      "\n",
      "Both plans offer coverage for vision and dental services. Northwind Health Plus offers coverage for vision\n",
      "exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings. Northwind Standard\n",
      "only offers coverage for vision exams and glasses.\n",
      "\n",
      "Both plans offer coverage for medical services. Northwind Health Plus offers coverage for hospital stays,\n",
      "doctor visits, lab tests, and X-rays. Northwind Standard only offers coverage for doctor visits and lab\n",
      "tests.\n",
      "\n",
      "Northwind Health Plus is a comprehensive plan that offers more coverage than Northwind Standard.\n",
      "Northwind Health Plus offers coverage for emergency services, mental health and substance abuse\n",
      "coverage, and out-of-network services, while Northwind Standard does not. Northwind Health Plus also\n",
      "\n",
      "<!-- PageBreak -->\n",
      "\n",
      "offers a wider range of prescription drug coverage than Northwind Standard. Both plans offer coverage\n",
      "for vision and dental services, as well as medical services.\n"
     ]
    }
   ],
   "source": [
    "# Hybrid Search\n",
    "query = \"Which is more comprehensive, Northwind Health Plus vs Northwind Standard?\"  \n",
    "if use_ocr:\n",
    "    query = \"Who is the national director?\"\n",
    "if use_document_layout:\n",
    "    query = \"What is contoso?\"\n",
    "\n",
    "search_client = SearchClient(endpoint, index_name, credential=credential)\n",
    "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=1, fields=\"vector\", exhaustive=True)\n",
    "  \n",
    "results = search_client.search(  \n",
    "    search_text=query,  \n",
    "    vector_queries= [vector_query],\n",
    "    select=[\"parent_id\", \"chunk_id\", \"chunk\"],\n",
    "    top=1\n",
    ")  \n",
    "  \n",
    "for result in results:  \n",
    "    print(f\"parent_id: {result['parent_id']}\")  \n",
    "    print(f\"chunk_id: {result['chunk_id']}\")  \n",
    "    print(f\"Score: {result['@search.score']}\")  \n",
    "    print(f\"Content: {result['chunk']}\")  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform a hybrid search + semantic reranking"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Semantic Answer: If an individual is not able to afford the cost of habilitation services, they may want to consider seeking assistance from a state... Overall, the<em> Northwind Health Plus </em>plan provides<em> comprehensive coverage </em>for habilitation services. It is important to understand the coverage limits and exceptions for habilitation services before seeking treatment.\n",
      "Semantic Answer Score: 0.9919999837875366\n",
      "\n",
      "parent_id: aHR0cHM6Ly9tYWdvdHRlaWFkbHNnZW4yLmJsb2IuY29yZS53aW5kb3dzLm5ldC9pbnQtdmVjLW1hcmtkb3duL0JlbmVmaXRfT3B0aW9ucy5tZDs00\n",
      "chunk_id: d9167148bb62_aHR0cHM6Ly9tYWdvdHRlaWFkbHNnZW4yLmJsb2IuY29yZS53aW5kb3dzLm5ldC9pbnQtdmVjLW1hcmtkb3duL0JlbmVmaXRfT3B0aW9ucy5tZDs00_pages_0\n",
      "Reranker Score: 3.4136228561401367\n",
      "Content: Both plans offer coverage for routine physicals, well-child visits, immunizations, and other preventive\n",
      "care services. The plans also cover preventive care services such as mammograms, colonoscopies, and\n",
      "other cancer screenings.\n",
      "\n",
      "Northwind Health Plus offers more comprehensive coverage than Northwind Standard. This plan offers\n",
      "coverage for emergency services, both in-network and out-of-network, as well as mental health and\n",
      "substance abuse coverage. Northwind Standard does not offer coverage for emergency services, mental\n",
      "health and substance abuse coverage, or out-of-network services.\n",
      "\n",
      "Both plans offer coverage for prescription drugs. Northwind Health Plus offers a wider range of\n",
      "prescription drug coverage than Northwind Standard. Northwind Health Plus covers generic, brand-\n",
      "name, and specialty drugs, while Northwind Standard only covers generic and brand-name drugs.\n",
      "\n",
      "Both plans offer coverage for vision and dental services. Northwind Health Plus offers coverage for vision\n",
      "exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings. Northwind Standard\n",
      "only offers coverage for vision exams and glasses.\n",
      "\n",
      "Both plans offer coverage for medical services. Northwind Health Plus offers coverage for hospital stays,\n",
      "doctor visits, lab tests, and X-rays. Northwind Standard only offers coverage for doctor visits and lab\n",
      "tests.\n",
      "\n",
      "Northwind Health Plus is a comprehensive plan that offers more coverage than Northwind Standard.\n",
      "Northwind Health Plus offers coverage for emergency services, mental health and substance abuse\n",
      "coverage, and out-of-network services, while Northwind Standard does not. Northwind Health Plus also\n",
      "\n",
      "<!-- PageBreak -->\n",
      "\n",
      "offers a wider range of prescription drug coverage than Northwind Standard. Both plans offer coverage\n",
      "for vision and dental services, as well as medical services.\n",
      "Caption: The plans also cover preventive care services such as mammograms, colonoscopies, and other cancer screenings. <em>Northwind Health Plus </em>offers<em> more comprehensive coverage than Northwind Standard.</em> This plan offers coverage for emergency services, both in-network and out-of-network, as well as mental health and substance abuse coverage. Northwind Health.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.models import (\n",
    "    QueryType,\n",
    "    QueryCaptionType,\n",
    "    QueryAnswerType\n",
    ")\n",
    "# Semantic Hybrid Search\n",
    "query = \"Which is more comprehensive, Northwind Health Plus vs Northwind Standard?\"\n",
    "if use_ocr:\n",
    "    query = \"Who is the national director?\"\n",
    "if use_document_layout:\n",
    "    query = \"What is contoso?\"\n",
    "\n",
    "search_client = SearchClient(endpoint, index_name, credential)\n",
    "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=1, fields=\"vector\", exhaustive=True)\n",
    "\n",
    "results = search_client.search(  \n",
    "    search_text=query,\n",
    "    vector_queries=[vector_query],\n",
    "    select=[\"parent_id\", \"chunk_id\", \"chunk\"],\n",
    "    query_type=QueryType.SEMANTIC,\n",
    "    semantic_configuration_name='my-semantic-config',\n",
    "    query_caption=QueryCaptionType.EXTRACTIVE,\n",
    "    query_answer=QueryAnswerType.EXTRACTIVE,\n",
    "    top=1\n",
    ")\n",
    "\n",
    "semantic_answers = results.get_answers()\n",
    "if semantic_answers:\n",
    "    for answer in semantic_answers:\n",
    "        if answer.highlights:\n",
    "            print(f\"Semantic Answer: {answer.highlights}\")\n",
    "        else:\n",
    "            print(f\"Semantic Answer: {answer.text}\")\n",
    "        print(f\"Semantic Answer Score: {answer.score}\\n\")\n",
    "\n",
    "for result in results:\n",
    "    print(f\"parent_id: {result['parent_id']}\")  \n",
    "    print(f\"chunk_id: {result['chunk_id']}\")  \n",
    "    print(f\"Reranker Score: {result['@search.reranker_score']}\")\n",
    "    print(f\"Content: {result['chunk']}\")  \n",
    "\n",
    "    captions = result[\"@search.captions\"]\n",
    "    if captions:\n",
    "        caption = captions[0]\n",
    "        if caption.highlights:\n",
    "            print(f\"Caption: {caption.highlights}\\n\")\n",
    "        else:\n",
    "            print(f\"Caption: {caption.text}\\n\")\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
